This notebook shows how to create maps in a notebook, without having to write a ton of JavaScript or Python code. It uses folium, which leverages Leaflet.js, a popular JavaScript library to create interactive maps. folium supports base maps using tilesets from MapBox, OpenStreetMap, and others, out of the box. folium also makes it very easy to plot data on maps using GeoJSON and TopoJSON overlays.
You will need to install the following in order to run this notebook.
!pip install folium==0.1.3
!pip install xlrd==0.9.3
!pip install seaborn==0.5.1
!pip install matplotlib==1.4.3
!pip install pandas==0.15.2
In [3]:
%matplotlib inline
In [4]:
import matplotlib.pyplot as plt
In [5]:
import pandas
import seaborn
This notebook uses state government tax collections data from the US Census Bureau. The data are for fiscal year 2013. The data file contains tax collections by US state governments for for a variety of tax categories, such as income, property, and sales taxes.
In [6]:
!wget -O 13staxcd.txt http://www2.census.gov/govs/statetax/13staxcd.txt
In [7]:
df = pandas.read_csv('13staxcd.txt', index_col='ST').dropna(axis=1)
# Because, yeah, values are in 1000s of dollars
df = df * 1000
df.head()
Out[7]:
We need a second file that provides descriptions for the tax item codes (the TXX numbers).
In [8]:
!wget -O TaxItemCodesandDescriptions.xls http://www2.census.gov/govs/statetax/TaxItemCodesandDescriptions.xls
In [9]:
tax_codes_df = pandas.read_excel('TaxItemCodesandDescriptions.xls', 'Sheet1', index_col='Item Code')
tax_codes_df.head()
Out[9]:
In [10]:
print '${:,}'.format(df.sum().sum())
According to the data source:
The Annual Survey of State Government Tax Collections (STC) provides a summary of taxes collected by state for 5 broad tax categories and up to 25 tax subcategories. These tables and data files present the details on tax collections by type of tax imposed and collected by state governments.
The only thing missing from the data thus far are the "5 broad tax categories", and which of the 25 subcategories make up each one. We had to look this up, and download another Excel file. There's also this report, which provides some details about tax categorization, but also seems to contradict the Excel spreadsheet. Oh, the humanity.
In [11]:
!wget -O agg_tax_categories.xls http://www2.census.gov/govs/estimate/methodology_for_summary_tabulations.xls
In [12]:
tmp = pandas.read_excel('agg_tax_categories.xls')
tmp[8:21].dropna(how='all').dropna(how='all', axis=1).head()
Out[12]:
After some investigation, we can write a short function to retrieve the major tax category by tax item code.
In [13]:
def category(tax_item):
'''Return tax category for the tax item code.'''
if tax_item == 'T01':
return 'Property Taxes'
elif tax_item in ['T40', 'T41']:
return 'Income Taxes'
elif tax_item in ['T09', 'T10', 'T11', 'T12', 'T13', 'T14', 'T15', 'T16', 'T19']:
return 'Sales and Gross Receipts Taxes'
elif tax_item in ['T20', 'T21', 'T22', 'T23', 'T24', 'T25', 'T26', 'T27', 'T28', 'T29']:
return 'License Taxes'
return 'Other Taxes'
Sum all taxes collected by broad category.
In [14]:
# assign broad category to each tax item code
tmp = df.copy()
tmp['Category'] = tmp.index.map(category)
# aggregate taxes collected by each state by broad category
by_category = tmp.groupby('Category').sum()
# sum across all states
totals_by_category = by_category.sum(axis=1)
print totals_by_category.map('${:,}'.format)
Plot the total taxes collected for by broad category.
In [15]:
totals_by_category.plot(kind='pie', labels=totals_by_category.index,
figsize=(10,10), autopct='%.1f%%')
Out[15]:
Here is a violin plot (a combination of boxplot and kernel density plot) that shows the distribution of taxes collected for each category.
In [16]:
data = by_category.T
fig, ax = plt.subplots(figsize=(14,10))
seaborn.violinplot(data, color="Set3", bw=.2, cut=.6,
lw=.5, inner="box", inner_kws={"ms": 6}, ax=ax)
Out[16]:
In [17]:
print data[['Income Taxes', 'Sales and Gross Receipts Taxes']].describe()
In [18]:
taxes_by_state = df.sum().sort(inplace=False, ascending=False)
taxes_by_state[:10].map('${:,}'.format)
Out[18]:
It may not surprise anyone that California and New York top the list; however, it may surprise some that California collected almost twice as much tax revenue as New York. Here is a bar chart to help visualize the magnitude of taxes collected by state.
In [19]:
fig, ax = plt.subplots(figsize=(12,8))
data = taxes_by_state.reset_index()
data.columns = ['State', 'Taxes']
# plot values in $ billions
seaborn.barplot(data.index, data.Taxes / 1000000000,
ci=None, hline=.1, ax=ax)
ax.set_xticklabels(data.State)
ax.set_ylabel('$ Billions')
ax.set_xlabel('State')
ax.set_title('Taxes Collected by US State and Local Governments, FY 2013')
plt.tight_layout()
We want to overlay our tax data over a map of the United States. To do this, we'll use the following:
In [20]:
# the aggregate data by broad category
tmp = by_category.T
# make up our own tax item codes for broad categories
codes = ['I','L','O','P','S']
# create complete list of category names
category_names = tax_codes_df.Description.append(
pandas.Series(tmp.columns, index=codes)
)
# merge broad category data with data for 25 subcategories
tmp.columns = codes
data = df.T.merge(tmp, left_index=True, right_index=True)
data.head()
Out[20]:
We created a TopoJSON file of US borders using us-atlas. Creation of this file is beyond our scope here, but you can download it from our GitHub repository.
In [21]:
!wget -O us-states-10m.json https://raw.githubusercontent.com/knowledgeanyhow/notebooks/master/tax-maps/data/us-states-10m.json
In [22]:
us_topo_map = 'us-states-10m.json'
import os
assert os.path.isfile(us_topo_map)
statinfo = os.stat(us_topo_map)
assert statinfo.st_size > 0
Our tax data is indexed by state. We need a way to bind our data to the state geometries in our map. The geometries in our TopoJSON file are keyed by FIPS codes (Federal Information Processing Standard). So we need to obtain the FIPS codes for US states (from the US Census Bureau), and add them to our data.
In [23]:
!wget -O us_state_FIPS.txt http://www2.census.gov/geo/docs/reference/state.txt
In [24]:
fips = pandas.read_csv('us_state_FIPS.txt', delimiter='|', index_col='STUSAB')
fips.head()
Out[24]:
Add FIPS column to our data.
In [25]:
data['FIPS'] = data.index.map(lambda x: fips.loc[x]['STATE'])
data['FIPS'].head()
Out[25]:
Folium utilizes IPython's rich display to render maps as HTML. Here are two functions that use different mechanisms to render a map in a notebook. Either will work in modern browsers.
In [26]:
import folium
from IPython.display import HTML
def inline_map(map):
"""
Embeds the HTML source of the map directly into the IPython notebook.
This method will not work if the map depends on any files (json data). Also this uses
the HTML5 srcdoc attribute, which may not be supported in all browsers.
"""
map._build_map()
return HTML('<iframe srcdoc="{srcdoc}" style="width: 100%; height: 510px; border: none"></iframe>'.format(srcdoc=map.HTML.replace('"', '"')))
def embed_map(map, path="map.html"):
"""
Embeds a linked iframe to the map into the IPython notebook.
Note: this method will not capture the source of the map into the notebook.
This method should work for all maps (as long as they use relative urls).
"""
map.create_map(path=path)
return HTML('<iframe src="files/{path}" style="width: 100%; height: 510px; border: none"></iframe>'.format(path=path))
Now we create a function that accepts a tax code, creates a basemap of the United States, and adds a TopoJSON overlay with the appropriate state tax data bound to it.
In [27]:
def create_tax_map(tax_code, path='tax_map.html'):
'''
Create a base map with tax data bound to a GeoJSON overlay.
'''
# lookup tax category name
tax_name = category_names.loc[tax_code] + ' ($ Millions)'
# lookup tax data
d = data[['FIPS',tax_code]].copy()
d[tax_code] = d[tax_code] / 1000000L
# compute a color scale based on data values
max = d[tax_code].max()
color_scale = [max*q for q in [0, 0.1, 0.25, 0.5, 0.75, 0.95]]
# create base map
map = folium.Map(location=[40, -99], zoom_start=4, width=800)
# add TopoJSON overlay and bind data
map.geo_json(geo_path=us_topo_map, data_out='tax_map.json',
data=d, columns=d.columns,
key_on='feature.id',
threshold_scale=color_scale,
fill_color='PuBuGn', line_opacity=0.3,
legend_name=tax_name,
topojson='objects.states')
map.create_map(path=path)
return map
inline_map(create_tax_map('T40'))
Out[27]:
Use a widget to choose the tax category and render the map interactively.
In [28]:
from IPython.html import widgets
from IPython.display import display
from IPython.html.widgets import interact
tax_categories = category_names.to_dict()
tax_categories = dict(zip(tax_categories.values(), tax_categories.keys()))
dropdown = widgets.Dropdown(options=tax_categories, value='T40', description='Tax:')
def show_map(tax_code):
display(inline_map(create_tax_map(tax_code)))
widgets.interact(show_map, tax_code=dropdown)
Out[28]: